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Abstract 

Indirect Immunofluorescence Imaging of Human Epithelial Type 2 (HEp-2) cells is an effective way to identify the presence of Anti- 
Nuclear Antibody (ANA). Most existing works on HEp-2 cell classification mainly focus on feature extraction, feature encoding 
and classifier design. Very few efforts have been devoted to study the importance of the pre-processing techniques. In this paper, 
we analyze the importance of the pre-processing, and investigate the role of Gaussian Scale Space (GSS) theory as a pre-processing 
approach for the HEp-2 cell classification task. We validate the GSS pre-processing under the Local Binary Pattern (EBP) and 
the Bag-of-Words (BoW) frameworks. Under the BoW framework, the introduced pre-processing approach, using only one Local 
Orientation Adaptive Descriptor (LOAD), achieved superior performance on the Executable Thematic on Pattern Recognition 
Techniques for Indirect Immunofluorescence (ET-PRT-IIE) image analysis. Our system, using only one feature, outperformed the 
winner of the ICPR 2014 contest that combined four types of features. Meanwhile, the proposed pre-processing method is not 
restricted to this work; it can be generalized to many existing works. 
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1. Introduction 

Indirect Immunofluorescence Imaging of Human Epithelial 
Type 2 (HEp-2) cells dE) is a commonly used way to identify 
the presence of Anti-Nuclear Antibody (ANA) that is consid¬ 
ered as an effective approach to diagnose various autoimmune 
diseases. Before, the human experts are required to identify 
the types of HEp-2 cells according to their experience. This 
process is highly subjective depending on the experience of the 
experts, and errors usually happen especially when considering 
the large intra-class variations and small inter-class variations 
in the HEp-2 cells. The recognition of HEp-2 cells is a typical 
pattern recognition problem. Recently, several contests held in 
the past three years on the HEp-2 cell classification have greatly 
raised the interests in the development of an effective recogni¬ 
tion system. There were 28, 14, and 11 submissions individu¬ 
ally submitted to the ICPR 2012 O, ICIP 2013 Elia and ICPR 
2014 (a HEp-2 cell classification contests. Those methods 
range from applying fast morphological methods (3, to design¬ 
ing or transferring new features or feature encoding approaches 
or classifiers IHl, and to fusing different approaches ll9lfT0l. 

Existing works on the HEp-2 cell classification mainly fo¬ 
cused on three aspects: feature extraction, feature encoding 
and classifier. Among all three aspects, the feature extrac¬ 
tion received most of the attention. Many well-known features 
were applied to this application, such as Scale Invariant Eeature 
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Transformation (SILT) ifTTll . Local Binary Pattern (EBP) |[T3 
and Gray Level Co-occurrence Matrix (GLCM) CD. Mean¬ 
while, there were also some new features proposed for the task, 
such as Co-occurrence of Adjacent EBP (CoALBP) |[T4l and 
Shape Index Histogram (SIH) ca. The feature encoding is 
an important stage in the traditional Bag-of-Words (BoW) na 
model. Many advanced feature encoding approaches, such 
as Hard Assignment, Kernel Codebook, Sparse Coding (SC), 
Local-constrained Linear Coding (EEC) and Improved Eisher 
Vector (lEV), were studied on this task. A novel feature encod¬ 
ing technology named as Eisher Tensor ifTTI was also proposed. 
The choices of classifiers are important to the final classifica¬ 
tion accuracy. The Support Vector Machine (SVM) na is the 
most widely used classifier on this task. There were also some 
works studying the effectiveness of other classifiers, such as 
Shareboost 0, K-NN IT^ . 

By contrast, very few efforts have been devoted to study the 
importance of the pre-processing technique. We highly agree 
that the above mentioned three aspects are important on the 
HEp-2 cell classification, but we also believe that effective pre¬ 
processing technique will benefit this task greatly. Thus, in 
this paper, we propose to analyze the importance of the pre¬ 
processing, and investigate the role of Gaussian Scale Space 
(GSS) theory as a pre-processing technique. We propose to 
evaluate the GSS pre-processing in two different frameworks: 
the EBP framework and the BoW framework. Extensive ex¬ 
periments show that the proposed GSS pre-processing tech¬ 
nique greatly boosts the recognition performance of the ap¬ 
proach without using the GSS in both frameworks. 

One submission based on the proposed method achieved 
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superior performance (with Mean-Class-Accuracy (MCA) 
84.63%) on the ET-PRT-IIpQ Using only one type of feature, 
our approach greatly outperformed the winner of the ICIP 2013 
contest that combined two features, and exceeded the winner of 
ICPR 2014 contest that combined four types of features. The 
source code submitted to the contest is provided at the link|^ 

1.1. Related Works 

Existing works on the HEp-2 cell classification can be cate¬ 
gorized into three categories: 

Feature Extraction. In pattern recognition tasks, the feature 
extraction is always one of the most important stages. It greatly 
affects the final classification performance. On the HEp-2 cell 
classification, the EBP ifT^ and many of its variants have been 
applied to this task, such as Completed EBP (CLBP) 1^, Co¬ 
occurrence of Adjacent EBP (Co ALB P) llT4l . Pairwise Rotation 
Invariant Co-occurrence of EBP (PRICoLBP) IJH. In the ICPR 
2012 contest, the CoALBP ranked 1st among all 28 submis¬ 
sions. Later, in the ICIP 2013 contest, a combination of the PRI¬ 
CoLBP and the bag of SIET won the contest among all the 14 
submissions. In the recently held ICPR 2014 contest, an ensem¬ 
ble of four features cni (Multi-resolution Local Patterns, Root- 
SIET, Random Projections and Intensity Histograms) achieved 
the best Mean-Class-Accuracy (MCA) among all the 11 sub¬ 
missions on the Task 1 (cell classification). The same method 
also achieved the 1st rank on the Task 2 (specimen classifica¬ 
tion) 1221 . According to the previous three contests, it is easily 
found that, until now, the feature extraction dominates the task. 

Feature Encoding. In the BoW framework, the feature 
encoding is an important stage. Since the BoW model was 
widely used on the HEp-2 cell classification, there were sev¬ 
eral works ||23l|24l[25l that focus on transferring advanced fea¬ 
ture encodings (the reader can refer to the evaluation presented 
in ll^ to get a detailed information about some advanced en¬ 
coding methods.) or designing new encoding techniques. Eor 
instance, in the ICIP 2013 contest, Shen et al. used hard as¬ 
signment for the Bag of SIET model. In the ICPR 2014 contest, 
Ensafi et al. 1^ used Sparse Coding (SC) to encode the SIET 
and SURE features, and Manivannan et al. oni used the Local- 
constrained Linear Coding (EEC) ED to encode four types of 
features. Recently, Earaki et al. ifTTl introduced a novel en¬ 
coding approach called Eisher Tensors for the HEp-2 cell and 
texture classification. 

Classifier. The classification stage is the last stage of the 
whole system, thus it is directly related to the recognition per¬ 
formance. Nearest Neighbor (NN) classifier is the simplest 
classification method. It does not require any training. But the 
evaluation will become very slow when the scale of the problem 
is large. In nsi, Stoklasa et al. proposed to mine efficient K- 
NN for the task. Ersoy et al. (H proposed to use the Shareboost 
to conduct the classification. Most submissions in the previ¬ 
ous contests used the linear or kernel SVM. Usually, the SVM 


^http://mivia.unisa.it/iif2014/ 

^https://www.dropbox.com/s/q7xuht2ddwgr81f/ 
PRLettersMaterial.zip?dl=0 


shows better performance than the NN classifier. We believe 
some other classifiers are also worth studying in future, such as 
Random Eorests, Gaussian Processes. 

1.2. Contributions 

The novelty of this paper focuses on analyzing the role of the 
pre-processing. We propose to use a multi-resolution Gaussian 
Scale Smoothing to pre-process the image before the feature 
extraction stage. Eor the GSS, 

• we visually explain the underlying reasons why the GSS 
pre-processing can greatly benefit the HEp-2 cell classifi¬ 
cation task. 

• we experimentally show the significant improvement 
brought by the proposed pre-processing approach. One 
submission based on the proposed method to the ET-PRT- 
IIE achieves superior performance on this task. 

The remainder of this paper is organized as follows. In 
Section we introduce the proposed GSS pre-processing ap¬ 
proach, and analyze the underlying reasons that make the GSS 
work on the HEp-2 cell classification. In Sectionwe evaluate 
the proposed approach in detail and compare it the state-of-the- 
art approaches. In Section we conclude this paper with a 
discussion. 

2. The Role of Gaussian Scale Space Theory 
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Figure 1: Filtered images under different Gaussian scales. (A) and 
(B) belong to the “Positive” type, and (C), (D) and (E) come from the 
“Intermediate” type of HEp-2 cells. The “Intermediate” cell images 
are enhanced for better visualization. 


2.1. Gaussian Scale Space Theory 

Scale Space Theory (SST) ll^ is a multi-scale signal repre¬ 
sentation framework. Given a 2-D image /(v, y), the scale space 
of the image can be defined as: 

L(x, y, cr) = F{x, y, cr) * /(x, y), 

where * is the convolution operation in x and y, F{x, y, cr) is a 
filtering function, and cr is the scale factor. 
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In the SST, Gaussian function is the most widely used fil¬ 
tering function. It was shown by 1291 |28l that the Gaussian 
function is the only possible scale-space kernel under a variety 
of reasonable assumptions. Thus, in this paper, we will use the 
Gaussian function as the scale-space kernel. This is termed as 
Gaussian Scale Space (GSS) in literature. The 2-D Gaussian 
filter can be defined as: 

1 +y^ 

In this paper, we propose to regard the GSS as a pre¬ 
processing technique for the HEp-2 cell classification task. Be¬ 
fore, the feature extraction stage, we use the GSS to pre-smooth 
each image and then extract features from the filtered images. 

Analysis of underlying reasons why the GSS as a pre¬ 
processing works for the HEp-2 cell classification task. Erom 
our viewpoint, we believe the following four aspects may ex¬ 
plain the reasons: 

• Remove noise. As shown in Eigure[^ the HEp-2 cells in 
the data set have strong noise, especially on the “Interme¬ 
diate” images. Since the “Intermediate” images are hardly 
visible to human eyes in their original status, thus, we use 
a simple image enhancement algorithm to enhance the 
images. We can see from the enhanced “Intermediate” im¬ 
ages that there is strong noise in the “Intermediate” cells. 
In this situation, the GSS as a smoothing pre-processing as 
shown in Eigure 1 can effectively remove the noise. 

• Enhance texture information. With the increase of the 
Gaussian scale, the filtered image will become more 
smooth. In this way, the global texture information will 
tend to dominate the structure. With multi-resolution fil¬ 
tering, the subsequent features can capture cross-scale 
texture and shape information. Thus, the discriminative 
power of the features will be enhanced. 

• Boost the discriminative power of the final image repre¬ 
sentation by increasing the number of features. Suppose 
we pre-process the image with K Gaussian scales, then we 
will get K filtered images. Densely sampling features from 
the filtered images, the number of features increases by a 
factor of ^ -r 1 times compared to the direct sampling in 
the original images. In this way, multi-scale features are 
sampled, and the discriminative power of the final image 
representation are enhanced. 

• Ease the misalignment of the scales among different im¬ 
ages under the Bag-of-Words framework. Due to some 
potential reasons, such as the camera is partially out of fo¬ 
cus or the slight density variation of serum dilution the 
image scale may vary. Bagging of features extracted from 
the multi-resolution GSS filtering is an effective way to 
ease this problem. 


^I(x,y) = ’ where mini and maxi are the minimum and maximum 

values of I individually. 

^The ICPR 2014 contest data set used a serum dilution of 1:80. 


2.2. Gaussian Scale Space Theory as a Pre-processing 

In this paper, we study the GSS Theory as a pre-processing 
in two different frameworks: the EBP framework ca and the 
BoW model CSl. 



Figure 2: The flow chart of image representation under the EBP 
framework with the Gaussian Scale Space theory as a pre-processing. 

GSS as a Pre-processing in the LBP model. The flow chart 
of Scale Space Theory as a pre-processing in the LBP frame¬ 
work is shown in Eigure As shown in Eigure the input 
image is filtered with Gaussian filters in different scales. Then, 
the LBP histogram is extracted from each filtered image. Ei- 
nally, all LBP histograms are concatenated into the final image 
representation. 

To build a LBP histogram for each image, the LBP pattern in 
each pixel should be firstly computed as: 


where n is the number of neighbors and r is the radius, and 
is sign function, gc is the gray value of the central pixel, and 
gk is the pixel value of its ^-th neighbor. After obtaining LBP 
pattern for each pixel, a histogram can be built. Multi-scale 
strategy can be used to enhance the discriminative power of the 
descriptor by choosing different {n, r). In practice, to control the 
dimension of final representation, a rotation invariant uniform 
LBP (RIU-LBP) is used. The dimension of the RIU-LBP equals 
to ^ -f 2. Suppose the feature dimension extracted from one 
image is D, the final dimension of image representation that 
concatenates the features from the original image and K filtered 
images is -h 1) x D. 

Most LBP variants can follow this framework including the 
traditional LBP, the Completed LBP (CLBP) 1201 . the Co¬ 
occurrence of Adjacent LBP (CoALBP) ifTHl and Pairwise Ro¬ 
tation Invariant Co-occurrence of LBP (PRICoLBP) 1^ . In 
this paper, instead of pursuing higher performance, our aim is 
to demonstrate the effectiveness of our proposed pre-processing 
method, thus, we do not use these advanced LBP variants. 

GSS as a Pre-processing in the BoW model. The flow chart 
of the BoW model using the GSS pre-processing is shown in 
Eigure Eirst, each input image is filtered with Gaussian fil¬ 
ters of different scales. Then, local features, such as the Local 
Orientation Adaptive Descriptor (LOAD) 1301 . can be densely 


LBP{n, r) = E - gc)2^, s(x) 

k=0 
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(B). Samples of cells from six categories 



Centromere Golgi Homogeneous Nucleolar NuMem Speckled 


Figure 3: Examples of the specimen and cells. On the left panel of the figure, we show two specimen images. On the right panel, three samples 
for all six categories are shown, in which the first two rows show the “Positive” type and the third row shows one “Intermediate” type. The fourth 
row shows the corresponding enhanced images for the third row. Easy to see that the intra-class variation is big especially when considering the 
“Positive” and “Intermediate” types in the same category. 



Figure 4: The flow chart of image representation under the BoW 
framework with the Gaussian Scale Space theory as a pre-processing. 


extracted from the original image and all the filtered images. 
Finally, all LOAD features extracted from all scales are pooled 
into one IFV encoding to form the final image representation. 
The strategy of pooling all feature of different scales into one 
IFV representation can well ease the misalignment of the scales 
among different images. The BoW framework with the GSS 
pre-processing is different from the LBP framework with the 
GSS pre-processing. In the LBP framework, we extract one 
LBP histogram from each filtered image and concatenate all 
LBP histograms into the final representation, but in the BoW 
framework, only one BoW histogram is constructed for the fea¬ 
tures extracted from all filtered images. 

The choice of local features is flexible. All popular features, 
such as SIFT El, SURF GU, MORGH (321 and LIOP (331 
can be used. In the paper, we used our newly proposed LOAD. 
In nature, the LOAD descriptor can be considered as the LBP 
built on an adaptive coordinate system. In LOAD, we used 


four scales (LBP(8, 1), LBP(8, 2), LBP(8, 3), LBP(8, 4)). In 
each scale, only the uniform LBP patterns (59 patterns) were 
used. Therefore, the feature dimension of the LOAD was 
59 X 4 = 236. With filtered images in different scales, the fea¬ 
tures extracted from patches can capture multi-resolution infor¬ 
mation. As pointed out before, the features extracted from the 
filtered images with large scales can capture more global tex¬ 
ture information, while in the original or filtered images with 
small scales, the feature can depict fine detailed information. 

The proposed pre-processing approach also does not have 
any requirement to the feature encoding. Any feature encoding 
can be applied in the subsequent processing, such as the Vector 
Quantization (VQ), Soft Assignment, Kernel Codebook, Sparse 
Coding (SC), Local-constrained Linear Coding (LLC) and Im¬ 
proved Fisher Vector (IFV). In this paper, we use the IFV due 
to its effective encoding ability. 

The IFV representation measures the average first and sec¬ 
ond order differences between the local features and the Gaus¬ 
sian Mixture Models (GMMs). First, the Principal Compo¬ 
nent Analysis (PCA), such as D components, is used to remove 
the correlation between arbitrary two dimensions. Then, the 
GMMs, such as K GMMs, are learned from the after-PC A fea¬ 
tures. The average first and second order differences of the 
after-PCA features w.r.t. to the GMMs parameters are calcu¬ 
lated and concatenated. Therefore, the final dimension of the 
IFV representation is 2 x D x K. The IFV proves to be effec¬ 
tive when only using linear SVM. The linear SVM can greatly 
facilitate the final evaluation stage. The computational cost 
of the IFV is also low. We refer the reader to read the refer¬ 
ences (26j [34l to get a more through understanding of the IFV. 
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Figure 5: Evaluation under (A), the LBP framework, and (B). the BoW framework. Methods without using GSS pre-processing are marked with 
blue color and methods with GSS preprocessing are marked with red color. 


3. Experimental Analysis 

3.1. Dataset, Implementation Details and Evaluation Strategy 

Dataset. The I3A-2014 Task-1 data set was collected be¬ 
tween 2011 and 2013 at the Sullivan Nicolaides Pathology lab¬ 
oratory, Australia. The whole data set consists of 68,429 cells 
coming from six categories. The six classes are: Homogeneous 
(2,494 cells from 16 specimens), Speckled (2,831 cells from 16 
specimens), Nucleolar (2,598 cells from 16 specimens). Cen¬ 
tromere (2,741 cells from 16 specimens), Golgi (724 cells from 
4 specimens), and Nuclear Membrane (2,208 cells from 15 
specimens). The I3A-2014 Task-1 of the ICPR 2014 contest 
used the same data set as the previous ICIP 2013 contest. 

The training part contains 13,596 cell images that are col¬ 
lected from 83 different specimen images. The testing part con¬ 
sists of 54,833 cell images. The test data is privately maintained 
by the organizers and not publicly available until now. All re¬ 
sults evaluated on the test data set were reported by the contest 
organizers. Two specimen images from the I3A-2014 Task-2 
are shown in the left panel of Figure and some cells from 
each class are shown in the right panel of Figure 

Implementation Details. We evaluate the GSS as a pre¬ 
processing step in two different ways: the LBP and the BoW 
frameworks. For both methods, we use the original image and 
seven filtered images (cr = = 1.5 and n = 1,2...7) in 

default. We will evaluate the influence of different scale factor 
cr and different number of filters below. 

In the LBP approach, we use three scales ((8, 1), (16 2) and 
(24, 3)), and use rotation invariant uniform LBP. Therefore, the 
dimension of the feature vector extracted from one image is 54. 
Since we concatenate all features from original image and the 
filtered images, thus the dimension of the final feature vector is 
(1 -f 7) X 54 = 432. This framework is extremely fast, it takes 
less than 0.2s to process one image on a desktop with dual-core 
3.4G CPU. 


In the BoW framework with Improved Fisher Vector (IFV) 
encoding, we densely extract the LOAD feature^ from circu¬ 
lar patches with the radius 13 with a stride of two pixels in 
y-axis and one pixel in x-axis. On an image of size 70 x 70, 
around 4,600 LOAD features will be extracted. For the IFV, 
we use Principal Component Analysis (PCA) to decrease the 
dimension to 100 and then use 256 Gaussian Mixture Models 
(GMMs) to cluster the after-PC A features. Thus, the dimen¬ 
sion when using one dictionary is 2 x 100 x 256 = 51,200. 
Detailed description of the IFV can be found in |[34j[23. For 
our submission, to improve the stability of our algorithm, we 
use two dictionaries. But in our experiments, we only observed 
slight improvement (0.07 percentage point) with two dictionar¬ 
ies compared to using only one dictionary under the Leave- 
One-Specimen-Out strategy. The whole system takes less than 
L6s to process an image (including around l.Os for feature ex¬ 
traction of the LOAD, 0.6s for feature encoding, and almost 
none of time for classification because of using linear SVM.). 
For the linear SVM, we use the Liblinear ED to train and eval¬ 
uate the model. For the IFV, we use the Vlfeat 1^ toolbox. 

Two Evaluation Strategy are used in the paper: 

• Leave-One-Specimen-Out (LOSO) Evaluation. In the 
LOSO strategy, cell images from any 82 specimens are 
used for training, and the rest cell images from one spec¬ 
imen is used for testing. The final Mean-Class-Average 
(MCA) is reported based on the 83 splits. The strategy is 
an effective way to evaluate the algorithm when the test 
data set is not available. 

• Evaluation on the test data set. Evaluation on the test data 
set is a fair way to evaluate different algorithms. Every 
submission is blind to the test data. Meanwhile, the scale 
of the test data is large. 


^Detailed information about the LOAD can be found at oa. 
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Performance under different number of filters 


Performance under different scale factor 



Figure 6: Evaluations of the parameters. Left panel: classification accuracy under different number of filters. Right panel: classification accuracy 
under different scale factors. 


3.2. Comparative Analysis of Gaussian Scale Space 

To evaluate the effectiveness of the GSS as a pre-processing, 
we conduct two sets of experiments, one uses the GSS pre¬ 
processing and the other one does not. We use the LOSO eval¬ 
uation strategy. The category-wise accuracies using the LBP 
framework or the BoW framework are shown in Figure [^A), 
and Figure [^B). 

We can find from the Figurethat: 

• In both frameworks, the GSS pre-processing significantly 
improves the performance of that without pre-processing. 
In the LBP framework, the GSS boosts the MCA by 
around 8 percentage points. In the BoW framework, the 
GSS improves the MCA by around 7 percentage points. 

• On some categories, such as “Golgi”, “Homogeneous” and 
“Speckled”, the GSS pre-processing under both frame¬ 
works greatly boosts the performance. 

3.3. Evaluation of Different Number of Filters and Different 
Scale Factor 

In this subsection, we evaluate the influence of the param¬ 
eters to the classification performance under the BoW frame¬ 
work. The evaluation is conducted to answer two issues: 1). 
How many scales should be used? 2). What is the optimal scale 
factor (T? For the first question, we evaluate the BoW model un¬ 
der 9 different configurations; the results are shown in the left 
panel of the Figure For instance, “0” means we only use the 
features extracted from the original image, and “n” means we 
use the features extracted from the original image and n filter 
images with filter factors from 1.5^"^ to 1.5"“^ For the sec¬ 
ond question, we evaluate the BoW model under 7 different h 
(cr = b^~^,n = 1,2, ...,7) configurations; the results are shown 
in the right panel of the Figure 

From the left panel of the Figure we have two findings. 
First, the performance of using the pre-processing significantly 
improves that without using the pre-processing. For instance, 
only using one filtered image improves the performance by 4.1 


percentage points rather than that of without using filtered im¬ 
age. Second, the performance almost saturates when around 
seven filters are used; using more filters does not bring in per¬ 
formance gain, but increases the computational cost. Therefore, 
in the following experiments, we used seven filtered images. 

From the right panel, the proposed approach works best when 
b is set around 1.4; the differences between different configura¬ 
tions are small. The following results were based on the setting 
b = 1.5 because we used this setting in our previous submission 
to the contest. 

3.4. Experimental Results using the LOSO Strategy on the 
Training Data Set 

Since the test data set is not provided, we found that the 
leave-one-specimen-out strategy is an effective way to evalu¬ 
ate different methods. The same strategy was also used in some 
previous works (Vestergaard et al. ca and Manivannan et 
al. Col). In this strategy, the cells from 82 specimens among 
83 specimens are used for training and the rest cells from one 
specimen are used for testing. The results are based on aver¬ 
age of 83 splits. The category-wise accuracy of three different 
approaches and classification confusion matrix of our approach 
are shown in Table \T\ The results in the Table [T]reveal that: 

• Among all three algorithms, our method performs best, the 
approach of Manivannan et al. ranks 2nd and the method 
of Vestergaard et al. ranks 3rd. Note that Vestergaard et 
al.’s and our approach only use one type of feature, while 
Manivannan et al. combine four types of features. 

• In all three methods, our approach achieves the highest 
performance on five categories. For instance, our approach 
improves the Vestergaard et al. by around 3 percentage 
points and outperforms the method of Manivannan et al. 
by around 6 percentage points on the category “Homoge¬ 
neous”. We believe the huge improvement on the “Ho¬ 
mogeneous” accounts for (1): The proposed GSS pre¬ 
processing is effective and (2): The LOAD feature is good 
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Table 1: The category-wise accuracy of different approaches and clas¬ 
sification confusion matrix of our BoW with the LOAD feature using 
the leave-one-specimen-out strategy on the training data. 


(a) Category-wise classification accuracy. 


% 

Cen. 

Gol. 

Horn. 

Nuc. 

NuM. 

Spe. 

Average 

IFV 

87.81 

44.06 

79.03 

81.83 

87.09 

69.83 

74.93 

Vestergaard 

85.04 

50.97 

84.88 

87.49 

88.81 

75.03 

78.70 

Manivannan 

85.66 

58.01 

81.8 

90.65 

88.04 

77.36 

80.25 

GSSJFV (Ours) 

88.43 

59.53 

87.77 

90.69 

88.99 

76.76 

82.03 


(b) Classification confusion matrix of our approacli(82.03). 


% 

Cen. 

Gol. 

Horn. 

Nuc. 

NuM. 

Spe. 

Cen. 

88.43 

0.26 

0.73 

1.610 

0 

8.97 

Gol. 

3.73 

59.53 

6.63 

19.89 

7.87 

2.35 

Horn. 

0.08 

0.28 

87.77 

1.04 

1.56 

9.26 

Nuc. 

2.39 

0.96 

1.62 

90.69 

0.85 

3.50 

NuM. 

0 

1.77 

4.62 

2.04 

88.99 

2.58 

Spe. 

8.55 

0.07 

8.27 

4.95 

1.41 

76.76 


at capturing the texture information that is important in the 
category “Homogeneous”. 

• The category “Golgi” is easy to be misclassified into “Nu¬ 
cleolar”, and the categories “Homogeneous” and “Speck¬ 
led” are usually misclassified into each other. This may be 
because that the confusing pairs have similar texture and 
shape structures. 

3.5. Experimental Results on the Test Data Set 

All results reported in this subsection are provided by the 
organizer of previous contests: the ICIP 2013 and ICPR 2014 
contests. In this part, we compared our GSSJFV with six well¬ 
performing methods. These approaches are listed as follows: 

- Shen et al. IH combined the PRICoLBP ll2Tll and the Bag 
of SIFT (161, and used a linear SVM classifier. 

- Vestergaard et al. cn proposed a type of shape index his¬ 
tograms (SIH) with donut-shaped spatial pooling for the 
cell classification task. The computational complexity of 
the SIH method is extremely low. 

- Paisitkriangkrai et al. applied a multi-class Boost¬ 
ing (JTl approach to automatically recognize different pat¬ 
terns of the HEp-2 cells. In their system, they used five 
types of features. 

- Gao et al. (^ utilized the Convolutional Neural Network 
(CNN) (391 . and used a seven-layers CNN that consisted 
of three convolution layers, three pooling layers and one 
fully connection layer. 

- Theodorakopoulos et al. 1401 combined a set of morpho¬ 
logical features which contained two dimensional Boolean 
texture models and several textural descriptors. 


- Sansoneetal. ED used a rotation invariant dense local de¬ 
scriptor 1421 with kernel codebook soft assignment under 
the BoW model. 

Table [2] shows the classification confusion matrices of our 
approach and six other methods. Note that the winner 
(87.1%) (TOl of ICPR 2014 contest used 5,000 additional im¬ 
ages from the Task-2 along with all 13,596 training images. We 
do not list their result in Table [^because it is unfair to compare 
all relevant methods that only used the provided training data 
with the approach in ca. Our method only used one type of 
feature, but many above-mentioned approaches combined mul¬ 
tiple features. 

We can observe that from the Table 1). Among all the 
seven methods, our approach obtains the best averaged perfor¬ 
mance. Our approach improves Sansone’s method by about 
0.99 percentage point that is significant considering the dif¬ 
ference (0.31 percentage point) between Sansone’s (the second 
place) and Theodorakopoulos’s (the third place). Meanwhile, 
we can also see that the proposed approach works best on four 
categories including “Centromere”, “Homogeneous”, “Nucleo¬ 
lar” and “Speckled”. 2). The most confusing pairs are “Homo¬ 
geneous” and “Speckled”, “Golgi” and “Homogeneous”, and 
“Golgi” and “Nuclear Membrane”. The reason why “Homoge¬ 
neous” and “Speckled” are easily misclassified into each other 
is that these two categories have similar shape. 

4. Conclusion and Discussion 

In this paper, we study the role of Gaussian Scale Space 
(GSS) theory as a pre-processing approach for HEp-2 cell clas¬ 
sification task, and evaluate the GSS under two frameworks: the 
LBP and the BoW frameworks. Before, most research works on 
HEp-2 cell classification focused on feature extraction, feature 
encoding and classifiers. Very few efforts have been devoted 
to study the importance of the pre-processing. The proposed 
approach, using only one type of feature (LOAD), achieves su¬ 
perior performance on the large scale test data set maintained 
by the organizers. The proposed pre-processing approach can 
be generalized to most of the existing works (Ml [TOl, es¬ 
pecially the BoW-based and LBP-based approaches. We also 
expect that the proposed GSS pre-precossing approach can be 
applied to the deep learning approach as a data augmentation 
technique. 
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